Learning Recurrent Neural Networks with Hessian-Free Optimization

نویسندگان

James Martens

Ilya Sutskever

چکیده

In this work we resolve the long-outstanding problem of how to effectively train recurrent neural networks (RNNs) on complex and difficult sequence modeling problems which may contain long-term data dependencies. Utilizing recent advances in the Hessian-free optimization approach (Martens, 2010), together with a novel damping scheme, we successfully train RNNs on two sets of challenging problems. First, a collection of pathological synthetic datasets which are known to be impossible for standard optimization approaches (due to their extremely long-term dependencies), and second, on three natural and highly complex real-world sequence datasets where we find that our method significantly outperforms the previous state-of-theart method for training neural sequence models: the Long Short-term Memory approach of Hochreiter and Schmidhuber (1997). Additionally, we offer a new interpretation of the generalized Gauss-Newton matrix of sch (2002) which is used within the HF approach of Martens.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

2 Details of the pathological synthetic problems 3 2.1 The addition, multiplication, and XOR problem . . . . . . . . . . . . 3 2.2 The temporal order problem . . . . . . . . . . . . . . . . . . . . . . 4 2.3 The 3-bit temporal order problem . . . . . . . . . . . . . . . . . . . . 4 2.4 The random permutation problem . . . . . . . . . . . . . . . . . . . 4 2.5 Noiseless memorization . . . . . . ...

متن کامل

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable performance in the area of speech and handwriting recognition. The performance of an MDRNN is improved by further increasing its depth, and the difficulty of learning the deeper network is overcome by using Hessian-free (HF) optimization. Given that connectionist temporal classification (CTC) is utilized as an objective...

متن کامل

Training Neural Networks with Stochastic Hessian-Free Optimization

Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent o...

متن کامل

On the Efficiency of Recurrent Neural Network Optimization Algorithms

This study compares the sequential and parallel efficiency of training Recurrent Neural Networks (RNNs) with Hessian-free optimization versus a gradient descent variant. Experiments are performed using the long short term memory (LSTM) architecture and the newly proposed multiplicative LSTM (mLSTM) architecture. Results demonstrate a number of insights into these architectures and optimization ...

متن کامل

On the importance of initialization and momentum in deep learning

Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Learning Recurrent Neural Networks with Hessian-Free Optimization

نویسندگان

چکیده

منابع مشابه

Learning Recurrent Neural Networks with Hessian-Free Optimization: Supplementary Materials

Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

Training Neural Networks with Stochastic Hessian-Free Optimization

On the Efficiency of Recurrent Neural Network Optimization Algorithms

On the importance of initialization and momentum in deep learning

عنوان ژورنال:

اشتراک گذاری